SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation
نویسندگان
چکیده
FASTA and FASTQ are basic and ubiquitous formats for storing nucleotide and protein sequences. Common manipulations of FASTA/Q file include converting, searching, filtering, deduplication, splitting, shuffling, and sampling. Existing tools only implement some of these manipulations, and not particularly efficiently, and some are only available for certain operating systems. Furthermore, the complicated installation process of required packages and running environments can render these programs less user friendly. This paper describes a cross-platform ultrafast comprehensive toolkit for FASTA/Q processing. SeqKit provides executable binary files for all major operating systems, including Windows, Linux, and Mac OSX, and can be directly used without any dependencies or pre-configurations. SeqKit demonstrates competitive performance in execution time and memory usage compared to similar tools. The efficiency and usability of SeqKit enable researchers to rapidly accomplish common FASTA/Q file manipulations. SeqKit is open source and available on Github at https://github.com/shenwei356/seqkit.
منابع مشابه
Software News and Updates A Toolkit to Assist ONIOM Calculations
A general procedure for quantum mechanics and molecular mechanics (QM/MM) studies on biochemical systems is outlined, and a collection of PERL scripts to facilitate ONIOM-type QM/MM calculations is described. This toolkit is designed to assist in the different stages of an ONIOM QM/MM study of biomolecules, including input file preparation and checking, job monitoring, production calculations, ...
متن کاملDendroPy: a Python library for phylogenetic computing
UNLABELLED DendroPy is a cross-platform library for the Python programming language that provides for object-oriented reading, writing, simulation and manipulation of phylogenetic data, with an emphasis on phylogenetic tree operations. DendroPy uses a splits-hash mapping to perform rapid calculations of tree distances, similarities and shape under various metrics. It contains rich simulation ro...
متن کاملMRT dump file manipulation toolkit (MDFMT) - version 0.1
The MRT routing information export format represents an effective way of storing BGP routing information in binary dump files. Although a few tools exist to extract data from MRT dump files, most of them do not allow repacking or creating such MRT files. The MRT dump file manipulation toolkit (MDFMT) allows to repack parts of large MRT dump files containing BGP update messages into smaller ones...
متن کاملMFPPI – Multi FASTA ProtParam Interface
Physico-chemical properties reflect the functional and structural characteristics of a protein. The comparative study of the physicochemical properties is important to know role of a protein in exploring its molecular evolution. A number of online and offline tools are available for calculating the physico-chemical properties of a single protein sequence. However, a tool is not available for a ...
متن کاملPybel: a Python wrapper for the OpenBabel cheminformatics toolkit
BACKGROUND Scripting languages such as Python are ideally suited to common programming tasks in cheminformatics such as data analysis and parsing information from files. However, for reasons of efficiency, cheminformatics toolkits such as the OpenBabel toolkit are often implemented in compiled languages such as C++. We describe Pybel, a Python module that provides access to the OpenBabel toolki...
متن کامل